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ABSTRACT Decomposing proteins into "mo- 
legos," building blocks that are conserved in se- 
quence and 3D-structure, can identify functional 
elements. To demonstrate the specificity of the 
decomposition method, the PCPMer program 
suite was used to numerically define physical 
chemical property motifs corresponding to the 
molegos that make up the metal-containing active 
sites of three distinct enzyme families, from the 
dimetallic phosphatases, DNase 1 related nucle- 
ases/phosphatases, and dioxygenases. All three 
superfamilies bind metal ions in a (3-strand core 
region but differ in the number and type of ions 
needed for activity. The motifs were then used to 
automatically identify proteins in the ASTRAL40 
database that contained similar motifs. The pro- 
teins with the highest PCPMer score in the data- 
base were primarily metal-binding enzymes that 
were related in function to those in the alignment 
used to generate the PCPMer motif lists. The pro- 
teins that contained motifs similar to the dioxyge- 
nases differed from those found with PCP-motifs 
for phosphatases and nucleases. Relatively few 
metal-binding enzymes were detected when the 
search was done with PCP-motifs defined for in- 
terleukin-1 related proteins, which have a 
P-strand core but do not bind metal ions. While 
the box architecture was constant in each super- 
family, the specificity for the metal ion preferred 
for enzymatic activity is determined by the pat- 
tern of carbonyl, hydroxyl or imadazole groups in 
key positions in the molegos. These results have 
implications for the design of metal-binding en- 
zymes, and illustrate the ability of the PCPMer 
approach to distinguish, at the sequence level, 
structural and functional elements. Proteins 
2005;58:200-210. © 2004 Wiley-Liss, Inc. 
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INTRODUCTION 

Metalloenzymes must maintain a delicate balance, bind- 
ing ions tightly enough to retain them in the biological 



environment, while simultaneously allowing sufficient free 
sites for reactant binding. 1 The active sites of these 
enzymes contain a flexible network of carbonyl, hydroxyl, 
cysteinyl, and imadazole sidechains for inner shell coordi- 
nation of metal ions, while still allowing interactions with 
the reactive groups of the substrates. 2 " 3 In previous work, 
we used a novel, word-based approach to parse aligned 
protein sequences of the APE1 family of nucleases, which 
is a subfamily of the DNase 1 superfamily, into discrete 
sequence motifs. We named the conserved 3D-structural 
areas of these motifs "molegos," for protein building 
blocks. 4 ' 5 Molegos in our usage are shorter and more 
defined protein structure segments than the whole do- 
mains referred to elsewhere as molecular legos. 6 ' 7 Here we 
show that these decomposition methods can be used to 
distinguish types of metal-binding enzymes in sequence 
databases. 

The first step in our procedure is to decompose aligned 
sequences of proteins into physical chemical property 
(PCP)-based motifs 8 with our MOTIFMAKER program. 4 
The motifs defined by MOTIFMAKER can be used by the 
MOTIFMINER program to scan databases to identify 
sequences with similar physical chemical properties. Struc- 
tural data can then be used to determine which motifs 
correspond to structural elements that are highly con- 
served in other proteins in a family or superfamily, and 
are, thus, generally used molegos. In our previous work, 
we used this technique to determine the molegos that were 
common to both a non-specific nuclease (DNase 1) and a 
specific one, apurinic/apyrimidinic endonuclease (APE1) 
from those distinct for APE1. This allowed us to discrimi- 
nate residues binding 3 ' to the damage site in APE 1 , which 
were subsequently shown experimentally to be important 
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for mediating substrate-binding specificity and proccessiv- 
ity. 5 ' 9 

In this report, we show that this approach can be used to 
distinguish homologues of metal-binding protein families 
in sequence databases. To explore how specifically metal- 
binding sites could be defined using our automated motif 
mining suite, PCPMer, we performed a decomposition 
analysis similar to that used for the DNase 1 family for two 
other well-studied families of metalloenzymes. The first 
are the di-metal ion centered phosphatases, which cata- 
lyze phosphorolytic cleavage of a variety of substrates. The 
second is the dioxygenases, a mono-metallic enzyme fam- 
ily involved in the oxidation of environmentally hazardous 
chemicals. 10 The dioxygenases are members of the (func- 
tionally extremely diverse) vicinal oxygen chelate (VOC) 
superfamily, which have similar metal-binding sites and 
common motifs, but bind different metal ions and sub- 
strates. 11 We defined PCP-motifs for the three enzyme 
families according to their physical chemical parameters, 
and used MOTIFMINER to scan the ASTRAL40 database 
to find proteins of known structure that contained similar 
sequences. This analysis revealed that the motifs in each 
case detected the enzymes in the initial alignment, and 
proteins with similar metal-binding properties and func- 
tions. That is, the proteins with the highest PCPmer 
scores, when the dioxygenase motifs were used to scan the 
database, were different from those found with the phos- 
phatase motifs. Further, motifs from the interleukin-1 
family of (3-stranded growth factors, which are not known 
to bind metal ions, revealed many proteins that are related 
to this growth factor and receptor family but relatively few 
metal-binding proteins. This indicates that the combined 
PCPMer program, coupled with structural analysis, can 
serve as a useful aid for identifying distantly related 
homologues of a protein family. 

METHODS 

Physical Chemical Property Motifs and PCPMer 

The PCPMer suite combines two programs, MOTIF- 
MAKER and MOTIFMINER. The MOTIFMAKER pro- 
gram, 4 an outgrowth of our MASIA program, 12 searches 
for areas in aligned protein sequences that are conserved 
according to their physical chemical properties, based on a 
set of five vectors (E1-E5) that were defined by multidimen- 
sional scaling of 237 physicochemical properties of amino 
acid side chains. 8 The output of MOTIFMAKER is a series 
of numerical matrices for each motif in the protein that 
define the type and degree of conservation of the physical 
chemical properties of each column in the original se- 
quence alignment. These matrices can then be used to 
automatically scan sequence databases, using the MO- 
TIFMINER program, to identify proteins that contain 
sequences similar to the PCP-motifs defined for the initial 
set of proteins. 4 . Motifs can further be defined as "mo- 
legos," or molecular-building blocks, if their 3D-structure 
is conserved in the members of a family or superfamily 
where the motif occurs. 



Sequence Alignments 

Motifs and molegos are defined for protein families that 
are recognizable homologues of one another. Alignments 
based on sequence data alone, using methods such as 
CLUSTALW, can be used if the sequences are not too 
diverse (preferably between 30 and 80% identical) and 
there are few gaps or insertions. For more diverse se- 
quence families, such as those analyzed here, our previous 
work indicated that including structural information aids 
in properly aligning the sequences of known homologous 
proteins. Thus DALI 13 alignments of dimetallic phospha- 
tases, dioxygenases, or interleukin-1 (IL-1) related pro- 
teins of known structure, were used as input to the 
MOTIFMAKER program (the original alignments and 
motif lists are given as supplementary data). We checked 
these alignments and the motifs generated by visual 
analysis of the structures and by using expert analysis of 
the families published by other groups. 10 ' 11 ' 14-16 

Seqeunce Decomposition 

Sequence decomposition of the APE1 family and analy- 
sis of related motifs in other members of the DNase 1 
superfamily, using our MASIA tool (http://www.scsb.ut- 
mb.edu/masia/masia.html), was described previously. 5 ' 12 
PCP-motifs for the DNase 1 superfamily were isolated 
from an alignment of 17 diverse members of the DNase 1 
superfamily (including 7 DNase 1 and 7 APEs from diverse 
species, and 3 IPPs of mammalian origin). PCP-motifs 
were extracted from the sequence alignments with the 
MOTIFMAKER subroutine of PCPMer (http://www.scsb.ut- 
mb.edu/PCPMer/), 4,8 using a specific entropy value of 1.25, 
allowed gap of 2, and a minimum length 5 (the alignment, 
the PCPMer motifs, and the scoring matrices for the motifs 
are given as supplementary data). 

The 7 motifs that are common to the members of the 
DNase 1 superfamily are a subset of the 12 common to 
members of the APE subfamily. 4 To allow comparison with 
our previous report, the numbering of the molegos used in 
this study refers to the previously published list for the 
APE subfamily 4 - 5 The APE1 motif 1, 2, 7, 11, and 12 
correspond to the motifs 1, 2, 5-7 for the alignment of the 
DNase 1 superfamily. 

Motifs and molegos of the dimetallic phosphatases were 
defined in MOTIFMAKER using a DALI alignment of 4 
proteins of this superfamily of known structures. A sliding 
entropy definition was used and 18 motifs were defined. 
Motifs were defined similarly for a DALI alignment of 
three dioxygenase proteins that included the three metal- 
binding regions known to be similar in this family. Finally, 
a previously defined alignment of IL-ip homologues, all of 
which contain a similar (3-stranded core, 17 was used as a 
non-metal-binding control for the PCPMer method. 

Database Searching 

The MOTIFMINER subroutine of PCPMer was then 
used to score proteins in the ASTRAL40 database 1819 
(versions 55 and 63) according to their similarities to the 
PCP-motifs defined for the starting alignment. The AS- 
TRAL40 database contains —3,700 sequences of proteins, 
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representing nearly every unique protein structure in the 
PDB. Protein scores can be derived in two ways, depending 
on the method chosen to determine a significant match. 
Where conservation is high, a cutoff value for significance 
can be specified (such as 0.7). Alternatively, a mean 
scoring system can be selected, to use the average score of 
the sequences in the starting alignment and that of all 
sequence windows in the database to determine a signifi- 
cance threshold. 

Molego pictures were drawn with MOLMOL 20 from the 
indicated PDB files. 

RESULTS 

Molego Architecture of Three Metal-Binding 
Protein Families 

Figure 1 shows representatives of the metal-containing 
active sites in the three enzyme superfamilies compared in 
this study, for the DNase 1 superfamily, the dimetallic 
phosphatases, and the dioxygenases. The molegos, in this 
case the conserved (3-strands that make up the three sites, 
differ considerably between the three types of metalloen- 
zymes in their topology and the relative location of the 
metal ion(s). One representative structure is shown for 
each of the three metalloenzyme groups discussed in this 
report. The first structure, for human APE1, represents 
the DNase 1 topology. We previously observed that the 
(3-strand core of all enzymes of the DNase 1 superfamily is 
highly conserved 5 and particularly the five molegos that 
form the antiparallel active center of the enzymes. 4 For 
example, this area in inositol 5' polyphosphate phospha- 
tase, synaptojanin, a distantly related member of the 
DNase 1 fold family, is similar in sequence and geometry 
with that of APE1. The second molego drawing, for 5' 
nucleotide phosphatase, represents the dimetallic phospha- 
tase family, which contains two metal ions in the active 
site. Again, five p-strands make up its active center, but 
the overall topology is distinct from the monometallic 
APE1 site. The third p-core, for 2, 3-dihydroxybiphenyl 1, 
2-dioxygenase, differs considerably from the other two 
structures in that the metal ion is bound in the middle of 
the p-strands, not at the ends. 

Scanning of the proteins in the ASTRAL40 database 
with MOTIFMINER 4,8 revealed several metalloenzymes 
that contained sequence elements similar to the PCP- 
motifs of the DNase 1 superfamily (Table I and Mathura et 
al. 4 ). Among these were many nucleases, RNA and nucleo- 
tide-binding proteins, and proteins with metal-binding 
capability. To determine the selectivity of the PCPMer 
approach, we did a similar structural decomposition and 
database search for the two other metal ion-binding 
superfamilies, and (as control), for the IL-1 family of 
proteins that have a similar p-stranded core but no known 
metal-binding capability. 

PCP-Motifs of the Dimetallic Phosphatases Detect 
Other Phosphatases 

PCP-motifs for dimetallic phosphatases were identified 
by the MOTIFMAKER program in a DALI alignment of 
the sequences of four dimetallic phosphatases of known 



structure. The PCP-motifs were checked by comparing 
them to previously identified sequence motifs for one of the 
sequences, a representative of the nucleotide 5' phospha- 
tase family of proteins. 14 The molegos in the metal boxes of 
these enzymes are conserved across the superfamily, 
which includes enzymes with such diverse function as the 
DNA repair enzyme Mrell (PDB lii7; CSOP d.159.1.4), pig 
acid phosphatase (lute, SCOP d.159.1.1), and \-phage 
serine/threonine protein phosphatase (lg5b, SCOP 
d.159.1.3) (Schein et al., forthcoming). The PCP-motifs 
that were defined for the phosphatases with MOTIF- 
MAKER were then used to scan the ASTRAL40 database 
for sequences containing similar regions. MOTIFMINER 
results (Table II) show that the highest scoring proteins 
were the dimetallic phosphatases in the initial alignment, 
as well as a closely related protein phosphatase that was 
not included in that alignment. The other high-scoring 
proteins in this search were metalloenzymes that were 
similar in function to those of the starting alignment, and 
different from those found with the DNase 1 superfamily 
motifs (Table I). 

The Dioxygenases Have a Different Metal Ion 
Catalytic Center 

To further determine the specificity of the PCPMer 
methodology, we decomposed the aligned sequences of a 
family of metalloenzymes that are not functionally re- 
lated to the DNase 1 or the dimetallic phosphatase 
superfamilies. We chose the dioxygenases, a family 
within the vicinal oxygen chelate (VOC) superfamily of 
metalloenzymes that catalyze oxidative cleavage of C-C 
bonds, isomerizations, epimerizations, and nucleophilic 
substitutions. The motifs that characterize this super- 
family have been shown to form a PapPP structural unit 
in the metal-containing active center. 11 Compared to 
the phosphatases and nucleases, there are fewer protein 
ligands to the metal ion in the VOCs, presumably to 
allow tighter coordination between the substrate and 
the metal ion during the formation of the enolic inter- 
mediate. 16 The isolated molegos (Fig. 2) show how the 
P-strands of the dioxygenase metal site are conserved, 
regardless of the metal bound. Table III compares the 
sequence conservation of these three elements. While 
the first molego sequence is more variable, the other two 
are well conserved according to their physical chemical 
properties. The highlighted amino acids, made clear by 
the molego-based alignment in Table III, also illustrate 
how a small change (H to E in motif 1) may indicate 
selectivity for Zn 2+ in 1QIP. However, the pattern of 
change in the amino acids is not yet fully quantified, as 
will be discussed below for the ensemble of proteins, and 
will require observing the coordination spheres of more 
proteins in this family. 

The sequences of the three elements were defined as 
PCP-motifs using MOTIFMAKER and these were used 
to scan the ASTRAL40 database. MOTIFMINER rap- 
idly identified the three proteins in the initial alignment 
within the first top 20 proteins (Table IV). The inter- 
vening proteins with similar PCPMer scores were pre- 
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dominantly metal binding, and included many oxidases. 
The list of metal-binding proteins identified starting 
from the dioxygenases was distinct from those found by 
scanning the database with the conserved PCP-motifs of 
the DNase 1 (Table I) and dimetallic phosphatases (Ta- 
ble II). 

Using PCP-Motifs to Identify 3-Strand Proteins 
That Are Not Metalloenzymes 

About one third of all proteins bind metal, 21 and novel 
metal-binding sites have also been found by structure 
analysis. 22 Many, but not all metal-binding sites 23,24 in 
metalloenzymes are composed of (3-strands. To determine 
whether PCPMer was only recognizing the sequence pat- 
terns for p-strand formation and not metallo-binding sites, 
we isolated PCP-motifs from a structure-based alignment 
of proteins related to interleukin-1 (IL-1). 17 These proteins 
all have a p-strand core but are not known to bind metal 
ions. The highest scoring proteins related to this family in 



Fig. 1. Molego representations of the metal-containing active site 
regions of three different metailoenzyme families. A: From the structure of 
human APE1 with Mn 2 ^ (PDB file 1 DE9), a representative of the DNasel 
related nucleases and phosphatases; B: 5' nucleotide phosphatase with 
two Zn 2 " (PDB file 1USH), a representative of the dimetallic phospha- 
tases; C: 2,3-dihydroxybiphenyl 1 ,2-dioxygenase with Fell (PDB file 
1HAN), a representative of the dioxygenase family. The molego seg- 
ments are shown in ribbon format (corresponding to their conserved 
secondary structures across a family or superfamily), including the side 
chains of key residues near the metal ions. 

Fig. 2. Metal-binding molegos in three Fell binding dioxygenases 
(1MPY:catechol 2,3-dioxygenase;1HAN: 2,3-dihydroxybiphenyl 1,2- 
dioxygenase; 1CJX:4-hydroxyphenylpyruvate dioxygenase) and another 
member of the vicinal oxygen chelate superfamily (VOC) that binds zinc 
(1QIP, human glyoxylase). Note the metal ion binding residues are within 
the [J-strands, rather than projecting above them, and that the structure is 
constant while the residues that bind the metal ion dictate the specificity. 
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TABLE I. Highest Scoring Proteins in the ASTRAL40 Database of Representative PDB Files Selected by PCPMer, 
Using the Motif Profile of the DNase 1 Superfamih/ 



PCPMer 

score 3 PDB ID SCOP EC number Bound i( 



Description 











Mg 2+ 




1604 


1AKO 


d.l51.1.1 b 


3.1.11.2 


Mg^.Mn 2 * 


DNA-repair enzyme exonuclease III (E. coli)" 


1501 


1HD7 


d.l51.1.1 b 


4.2.99.18 


Mg^Mn 5 ^ 


DNA repair endonuclease Hapl (Hu) c 


1472 


1I9Z 


d.l51.1.2 b 


hydrolase 


Ca 2+ 


Synaptojanin, IPP5C domain {Yeast (Schizc S. 












pombe)) c 








Nuc trans 


Mg 2 * 




1371 


2BCE 


c.69.1.1 


3.1.1.13 


taurocholate 


Bile-salt activated lipase (cholesterol esterase) (Cow (Bos 












taurus)) 


1365 


1QQQ 


d.117.1.1 


2.1.1.45 


Nucleotide 


Thymidylate synthase [E. coli) 


1364 


1GQI 


c.1.8.10 


3.2.1.139 


Co 2 " Mg* + 


(A: 152-712) alpha-D-glucuronidase catalytic domain 
[Pseudoirwnas cellulosa] 


1355 


1E4M 


c.1.8.4 


3.2.3.1 


Zn 2 " 


Plant beta-glucosidase (myrosinase) (White mustard 


1352 


1F8M 


c.1.12.6 


4.1.3.1 


Mg 2 * 


(Sinapis alba)) 
Isocitrate lyase {Mycobacterium tuberculosis) 


1350 


1GPI 


b.29.1.10 


3.2.1.91 




Cellobiohydrolase I (Cel7d) 


1340 


1150 


e.29.1.1 


2.7.7.6 


Mg 2+ ,Ca 2+ 


RBP1 (S. cerevisiae) 


1339 


1H08 


a.118.1.9 


3.6.1.34 






1334 


3BTA 


d.92.1.7 


3.4.24.69 


Zn 2 " 


(S. cerevisiae)) 

(A: 1-546) Botulinum neurotoxin [Clostridium botulinum, 












serotype A) 


1319 


1C8D 




Viral rotein 
ira protein 


Ca 2 ~ 














familiaris)) 


1315 


1FBN 


c.66.1.3 


Ribosome 


RNA 


Fibrillarin homologue [Archaeon Methanococcus 












jannaschii) 


1300 


1QFX 


c.60.1.3 


3.1.3.8 


POf" 


Phytase (myo-inositol-hexakisphosphate-3- 












phosphohydrolase) [Aspergillus niger) 


1289 


1M1X 


b.69.8.1 


Nuc. transporter 


Mn 2 - 


(A: 1-438) Integrin alpha N-terminal domain (Hu) 


1288 


1K06 


b.119.1.1 


Transferase 


RNA binding 


C-terminal autoproteolytic domain of nucleoporin nup98 


1287 


1CLC 


a.102.1.2 


3.2.1.4 


Ca 2 -,Zn 2+ 


(Hu) 

(135-575) CelD cellulase, C-terminal domain [Clostridium 












thermocellum) 


1282 


1D1Q 


c.44.1.1 


3.1.3.48 




Tyrosine phosphatase (Baker's yeast 
(S. cerevisiae)) 


1279 


1QAZ 


a.102.3.1 


3.5.1.45 


so r , 


Alginate lyase Al-III [Sphingomonas sp., Al) 


1271 


1QQ9 


c.56.5.4 


3.4.11 


Ca 2 -,Zn 2+ 


Aminopeptidase [Streptomyces griseus) 


1268 


1QQ1 


b.80.1.6 


Viral protein 


Cu 2 - 


P22 tailspike protein [Salmonella phage) 


1266 


1A2V 


b.30.2.1 


1.4.3.6 


(A:237-672) Copper amine oxidase, domain 3 (catalytic) 












(Hansenula polymorph/!) 


1254 


1FIU 


c.52.1.10 


3.1.21.4 


Mg 2 '- 


Restriction endonuclease NgoIV [Neisseria gonorrhoeae) 


1248 


1AYX 


a.102.1.1 


3.2.1.3 




Glucoamylase-lSaccharomycopsisfibuligera) 


1247 


1BHE 


b.80.1.3 


3.2.1.15 




Polygalacturonase-£ra>inia carotovom 


1238 


1BVY 


c.23.5.1 


1.14.4.1 




FMN-binding domain of the cytochrome P450bm-3 












[Bacillus megaterium) 


1238 


1M1N 


c.92.2.3 


1.18.6.1 


Fe 2+ MoOf- 


Nitrogenase iron-molybdenum protein, alpha chain 












[Azotobacter vinelandii) 


1237 


1FN9 


d.196.1.1 


Viral protein 


Zn 2 -,dsRNA 


Outer capsid protein sigma 3 (Reovirus) 


1237 


1N1T 


b.68.1.1 


3.2.1.18 




(A: 1-406) Trypanosoma rangeli sialidase 


1234 


1F46 


d.129.4.1 


Cell cycle 




Cell-division protein ZipA, C-terminal domain [E. coli) 


1221 


1USH 


d.159.1.2 


3.1.3.5 


Zn 2 " 


(26-362) 5'-nucleotidase (syn. UDP-sugar hydrolase), N- 












terminal domain [E. coli) 


1216 


1CJA 


d.144.1.3 


Transferase 


AMP 


Actin-fragmin kinase, catalytic domain (Slime mold 










binding 


(Physarum polycephalum)) 


1208 


2SHP 


c.45.1.2 


3.1.3.48 


po|- 


(A:219-525 Tyrosine phosphatase (Hu, shp-2) 



f Motif profile was generated by PCPMer with a relative entropy of 1.25, a gap of 2, and a minimum length of 5. 
a PCPMer uses a Bayesian scoring function to determine proteins that contain the highest scoring matching motifs. 
"The SCOP class, d.151.1, is for the DNase 1 superfamily. 
'Sequences in the initial alignment are bold. 

d Europium ions used to obtain phase data may demarcate calcium binding sites. 
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TABLE II. The Proteins in the ASTRAL40 Database (Version 1.63) That Most Closely Match the PCP-Motifs of the 
Dimetallic Phosphatases Are Predominantly Metal-Binding Proteins' 



PDB code SCOP 



Description 



5519 


1UTE 


d.159.1.1 


3.1.3.2 


Fe 2 0 4+ 


5420 


1USH 


d.159.1.2 


3.1.3.5 


Zn 2+ ,SOl- CC^" 


5284 


1AUI 


d.159.1.3 


3.1.3.16 


Ca 2 ',Fe 3 \Zn 2+ 


5283 


1117 


d.159.1.4 Replication 


ivm ,rU 4 , 










sof- 


5279 


1M7S 


e.5.1.1 


1.11.1.6 


Heme 


5217 


1G5B 


d.159.1.3 


3.1.3 


Hg 2+ ,Mn 2+ I SO^ 


5163 


1LI5 


c.26.1.1 




Zn 2 " 


5150 


2BCE 


c.69.1.1 


3.1.1.13 


Taurocholate 


5139 


1IAT 


c.80.1.2 


5.3.1.9 


SO 2 " 


5.1 30 


1RKD 


c.72.1.1 


2.7.1.15 


ADP, POJ- 


5125 


1150 


e.29.1.2 


2.7.7.6 


Zn 2i ,Mn 2i 


5122 


1F6D 


c.87.1.3 


5.1.3.14 


Na + ,Cr,UDP 


5117 


1MJG 


e.26.1.3 


1.2.99.2 


Cu 1 -,Ni 2+ ) Fe 4 Sl + 


5113 


1AOZ 


b.6.1.3 


1.10.3.3 


Cu 2+ ,Cu-0-Cu 


5099 


1IW7 


e.29.1.2 


2.7.7.6 


Mg 2+ ,Pb 2+ 


5095 


1EHK 


f.24.1.1 


1.9.3.1 


Dinuclear Cu, 












5094 


1JBO 


f.29.1.1 


Photosynthesis 


Ca 2+ ,Fe 4 Sf 


5091 


1PRE 


f.8.1.1 


Toxin 




5083 


1FLG 


b.70.1.1 


Oxidore ductase 


Ca 2H ',PQQ 


5074 


1QLW 


c.69.1.15 


Hydrolase 


SO 2 " 


5071 


1PBG 


c.1.8.4 


3.2.1.85 




5071 


1G8K 


c.81.1.1 


Oxidore ductase 


Mo 4 ^, FeS, Hg 2 *, 










Ca 2+ 


5070 


1A9X 


c.23.16.1 


Amidotransferase 


K,C1 ,Mn 2+ , 



PO;; 



Purple acid phosphatase (Pig) 

5'-nucleotidase (syn. UDP-sugar hydrolase), N-terminal 
domain [E. coli] (26^362) 

Ser/thr phosphatase-2B (PP-2B, calcineurin A subunit) (Hu) 
Mrell [Archaeon Pyrococcus furiosus) 



I [Pseudomonas syringae) 
ser/thr protein phosphatase {Bacteriophage k) 

Cysteinyl-tRNA synthetase (A: 1-315) IE. coli) 
Bile-salt activated lipase (cholesterol esterase) (Cow (Bos 
taurus)) 

Phosphoglucose isomerase, PGI (Hu) 

Ribokinase [E. coli) 

RNA Polymerase II [S. cerevisiae) 

UDP-N-acetylglucosarnine 2-epimerase [E. coli) 

Bifunctional carbon monoxide dehydrogenase/acethyl-CoA 

synthase a-subunit [Moorella thermoacetica) 
Ascorbate oxidase (Zucchini; A: 339-552) 
RNA-polymerase beta-prime [Theimus thermophilus) 
Bacterial ba3 type cytochrome c oxidase subunit I [Thermus 

thermophilus) 
Apoprotein al, PsaA [Synechococcus elongatus) 
(Pro) aerolysin, (Aeromonas hydrophila) (85-470) 
Ethanol dehydrogenase [Pseudomonas aeruginosa) 
Bacterial esterase [Alcaligenes sp.) 

6-phospho-beta-D-galactosidase, PGAL [Lactococcus lactis) 
Arsenite oxidase large subunit {Alcaligenes faecalis) (A:4-682) 

Carbamoyl phosphate synthetase, small subunit C-terminal 
domain [E. coli) (B:1653-1880) 



! 'The first part of the PCPMer results file for proteins in the ASTRAL40 database (version 1.63) is shown. A DALI alignment of 4 dimetallic 
phosphatases (highlighted in bold) was used to define 15 motifs in MOTIFMAKER using a sliding relative entropy scale [range (0.5-1.7) step 0.1, 
gap cutoff 1, length cutoff 6]. The database search was done in MOTIFMINER, with the matching sequences scored with a cutoff value of 0.7 (i.e., 
only sequences with a score of 0.7 or higher to a given motif would be considered a match). 



TABLE HI. Sequences of the Fe(II) Binding Motifs (See the corresponding molegos 
in Fig. 2) of the Dioxygenase Family (1MPY, 1HAN, 1C JX) ; 



PDB ID 


Motif 1 


Motif 2 


Motif3 


1MPY 


152 DHALMYG 158 


211 R-LHHVSFHL 219 


260 SGNRNEVFCGG271 


1HAN 


145 GHFVRCV 151 


207R-IHHFMLEV215 


255 SGVEVEYGW-263 


1CJX 


159 DHLTHNV166 


236 EGIQHVAFLT 245 


316 GDVFFEFIQRK327 


1QIP 


97 LELTHNW 103 


122RGFGHIGIAV131 


167DGYWIEILN-175 



tThe corresponding area of another member of the vicinal oxygen chelate family, 1QIP, a lyase with a 
different metal binding specificity (Zn) is shown for comparison. Residues that are in direct contact with 
the metal ion are bold. 



the ASTRAL40 database (Table V) were those in the initial 
alignment, cell surface receptors and viral structural 
proteins. Of the top 30 scoring proteins, only 8 were 
metalloenzymes, and several others contained metal ions 
that functioned in lattice formation. In comparison, for the 
phosphatase analysis, 17 of the top 24 sequences were 
metalloenzymes (Table II), as were 14 of the 23 highest 
scoring proteins for the dioxygenase motifs (Table IV). Thus, 
the program can distinguish different functional types of 
enzymes, and not just secondary structure regions. 



Correlating the Key Amino Acid(s) Presented by 
the Molego with Metal Ion Choice 

One reason that MOTIFMINER is able to identify met- 
al-binding motifs is that it uses not just the average value 
of residues in a column of the original alignment, but also 
the "relative entropy," a measure of the residue variabil- 
ity, in scoring motifs in the database sequences. The most 
conserved amino acids in the motifs described here are 
indeed those in direct contact with the metal ion, and 
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TABLE IV. Proteins Identified in the ASTRAL40 Database That Contain Regions With Significant Similarity to the 
PCP Motifs That Define the Metal-Binding Site of Dioxygenases 



PCPMer 

score PDB cede SCOP EC number Bound ion Description 



142.06 


1F03 


a.102.2.1 


3.2.1.24 


Ca 2+ 


HuClassIa-1; 2-mannosidase, 












catalyticdomain 


140.77 


1LVK 


c.37.1.9 


Contractile protein 


Mg 2 - 


MyosinSl, motor domain {(Dictyostelium 










discoideum)} 


140.28 


1HAN 


d.32.1.3 


1.13.11.39 


Fe 2+ 


2,3-Dihydroxybiphenyl dioxygenase" 


140.28 


1CJX 


d.32.1.3 


1.13.11.27 


Fe 2+ 


4-hydroxyphenylpyruvate dioxygenase 












(Pseudoinonas fluorescensY 1 


139.33 


1GPR 


b.84.3.1 


2.7.1.69 


— 


Glucose permease Iia domain, Ila-glc 












(Bacillus subtilis) 




1CJY 




3114 


Zn 2 " 




137.86 


1AVA 


c.1.8.1 


3.2.1.1 


Ca 2 * 


Plant alpha-amylase (Barley) 


137.86 


1CR1 


c.37.1.11 


2.7.7 


Sulfate 


g4p, DNAprimase, helicase domain 












(Bacteriophage T7) 


137.57 


1BSX 


a.123.1.1 


Hormone 


Triiodothiamine 


Hu Thyroid hormone receptor beta 


137.50 


1CYD 


c.2.1.2 


1.1.1.184 


NADPH 


Carbonyl reductase (Mouse) 


136.80 


1DF0 


d.3.1.3 


3.4.22.17 


Ca 2+ 


Calpain (calcium dependent protease) (Rat) 


136.60 


1QQT 


c.26.1.1 


6.1.1.10 


Zn 2i 


Methionyl-tRNAsynthetase 


136.54 


1UAA 


c.37.1.13 


3.6.1 


ADP 


DEXX box DNA helicase IE. coli) 


136.52 


1E5M 


c.95.1.1 


2.3.1.41 


Acyl group 


3-ketoacylACP synthasell [Synechocystis sp.) 


136.31 


1VNS 


a.111.1.3 


1.11.1.10 


Vanadate 


Chloroperoxidase {Curvularia inaequalis] 


136.26 


1FL1 


b.57.1.1 


Viral protein 


KT 


Protease (Kaposi's sarcoma-associated herpes 


136.05 


1ELU 


c.67.1.3 


lyase 


K- 


Cystine C-Slyase (Synechocystissp.), Fe S 










assembly 


135.87 


1QBK 


a.118.1.1 


Nuclear transport 


GNP.SeM, Mg 2 '' 


Hu-Karyopherin beta2 nuclear transporter 


135.87 


1ZPD 


c.36.1.1 


4.1.1.1 


Mg 2 -, 


Pyruvate decarboxylase [Zymomonas mobilis] 


135.69 


1FUR 


a.127.1.1 


4.2.1.2 


Malate 


Fumarase IE. coli) 


135.23 


iihp 


c.60.1.3 


3.1.3.8 


Sulfate 


Phytase(myo-inositol-hexakisphosphate-3- 










Fe 2+ 


phosphohydrolase) 


135.15 


1MPY 


d.32.1.3 


1.13.11.2 


Catechol2,3-dioxygenase 












[Pseudoinonas putida) 



"The C-terminus of the proteins in bold were in the DALI alignment used (MOTIFMAKER subprogram, PCPMer) to define the matrices for the 
motifs. 



MOTIFMINER will give the highest scores to sequences 
that match at these positions. Comparison of the protein- 
binding molegos from the three superfamilies revealed 
that similar binding sites bound different metals, and that 
the key amino acids that dictated the metal ion choice 
were indeed the most conserved. 

Table VI summarizes the metal-binding site and distances 
to nearby residues (within 3A of the metal ion except for the 
DNase 1 family representatives, where the metal is more 
loosely bound) of all the enzymes in this study as a function of 
the preferred metal ion for catalysis (which is not always 
identical with that used for the crystal structure determina- 
tion). The metal ions in several of the structures have bonds 
to substrates and water molecules, which for the sake of 
clarity have not been included in Table VI. For example, in 
the 1DE9 structure of HuAPEl with Mn 2+ , which is tetrahe- 
drally coordinated, there are additional bonds to oxygen 
atoms in the substrate DNA. The Ca 2+ ion in the synaptqja- 
nin (1I9Y) structure has 6 ligands within a 3.5 A radius, of 
which only two are from the protein. The rest are water ions. 
In 1QIP, the single Zn 2+ atom is coordinated by four protein 
ligands, and is also very close to the two oxygen atoms in the 
0 2 molecule in the active site. A full summary of the bonds to 



each metal ion in the PDB structures is included in Table 
VII, which is given as supplementary information. 

All the metals are bound tightly by at least one carboxy- 
lic oxygen, from an aspartate or a glutamate. The other 
residues in the binding site differ in a fashion that 
indicates their metal ion specificity, but the exact pattern 
must be determined by a more complete comparison of the 
molegos in other members of the families. As expected 
from previous analysis of hydration 25 and bonding pat- 
terns of metal ions in smaller complexes, 26 those enzymes 
preferring Mg 2 "'" and Ca 2 "'" 21 have predominantly oxygen 
ligands, such as the carboxyl groups of glutamate and 
aspartate, in the metal-binding site. The ions Mg 2- , Mn 2+ , 
and Ca 2+ are relatively close to one another in their 
"hardness" 27 " 29 and also share similar binding elements. 
Although Mn 2 "' is used in crystallographic structures (e.g., 
that for APED as a more electron-dense replacement for 
Mg 2 " 1 ", Mn 2+ has a much wider variation in the type of 
contacts it makes with protein ligands. The sites in the 
enzymes studied here that preferentially use Mn 2+ , such 
as the Mrell nuclease 30 and the ser/thr protein phospha- 
tase of phage X 31 combine carboxyls, carboxyl oxygens of 
Asn or Gin, and imadazole nitrogens. The "softer" ions, 
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TABLE V. Highest Scoring Proteins Found by MOTEFMENER in the ASTRAL40 Database Starting With 
PCP-Motifs Identified for an Alignment of IL-1 Related Proteins 



PCPMer 

score PDBID SCOP Bound ion Description 



1783 


1L2H 


b.42.1.2 


No 


Interleukin-1 beta (Hu}" 


1682 


1ILR 


b.42,1.2 


No 


Interleukin-1 receptor antagonist protein {Hu} a 


1658 


2ELA 


b.42,1.2 


No 


Interleukin-1 alpha {Hu) a 


1638 


1A28 


a. 123. 1.1 


No 


Progesterone receptor (Huj 


1627 


1DL2 


a.102.2.1 


Ca 2 ' 


Class I alpha 1 2 m arm last talytic domain {S. cerevisiae) 


1618 


1IVY 


c.69.1.5 


No 


Human "protective protein," HPP {Hu) 


1611 


1HLE 


e.1.1.1 


No 1 ' 


Elastase inhibitor (Horse) 


1610 


1MQS 


e.25.1.1 


No 


SlylP protein {S. cerevisiae)) 


1596 


1AUI 


d.159.1.3 


Fe 3 \Zn 2 " 


Protein phosphatase-2B (calcineurin A subunit) (Hu) 


1595 


1M7S 


e.5.1.1 


heme 


Catalase I {Pseudomonas syringae) 


1589 


1DMU 


c.52.1.4 


Ca 2 ' 


Restriction endonuclease Bgll {B. subtilis) 


1588 


1J5W 


d.104.1.1 


No 


Glycyl-tRNA synthetase (GlyRS) alpha chain {Thennotoga maritime, TM0216) 


1588 


1C3P 


c.42.1.2 


Zn 2+C 


HDAC homologue {Aquifex aeolicus] 


1584 


1AYM 


b.10.1.4 


Zn 2+d 


Rhinovirus coat protein (Hu rhinovirus 16) 


1578 


1M0Z 


c.10.2.7 


No 


von Willebrand factor binding domain of glycoprotein lb alpha (Hu) 


1569 


1BFG 


b.42.1.1 


No 


Basic FGF (FGF2) (Hu) a 


1564 


1CIP 


c.37.1.8 


Mg 2 ' 


(A32-60, A182-347) Transducin (alpha subunit) (Rat) 


1559.50 


1LL7 


c.1.8.5 


No 


(A:36-292, A:355-427) Chitinase 1 (Fungus (Coccidioides immitis)) 


1550 


1EU1 


c.81.1.1 


Mo 6+ ,Cd 2 ~ 


Dimethylsulfoxide reductase (DMSO reductase) {Rhodobacter sphaeroides) 


1549 


1MKF 


b.116.1.1 


No 


Viral chemokine binding protein m3 (Murine herpesvirus 4, Muhv-4) 


1540 


1LST 


c.94.1.1 


No 


Lysine-, arginine-, ornithine-binding (LAO) protein {Salmonella typhimurium) 


1536 


1QGI 


d.2.1.7 


Sulfate 


Endochitosanase {Bacillus circulans) 


1534 


1B6C 


d.144.1.1 


Sulfate 


Type I TGF-beta receptor R4 (Hu) 


1530 


1CXP 


a.93.1.2 


Heme, Ca 2 ' 


Myeloperoxidase (Hu) 


1523 


1FN9 


d.196.1.1 


Zn 2+ 


Outer capsid protein sigma 3 (Reovirus) 


1519 


1E6U 


c.2.1.2 


SO 2 " 


GDP-4-keto-6-deoxy-d-mannose epimerase/reductase {E. coli) 


1516 


1GKY 


c.37.1.1 


SO 2- , GMP 


Guanylate kinase ((S. cerevisiae)} 


1497 


HOW 


c.30.1.2 


Mg 2 * 


D-Ala-D-Ala ligase, N-domain (1-96) IE. coli, gene ddlB) 


1492 


2BPA 


b.10.1.1 


No 


Bacteriophage phi-X174 capsid proteins 


1491 


1RUX 


b.13.2.2 


No 


Adenovirus hexon (Hu adenovirus type 5) 



"The four proteins in the initial DALI alignment used to define PCP-motifs are bold. 
b Calcium ion identified in structure mediates a lattice contact. 
'Not in crystal structure. 

d Nonenzymatic Zn 2 1 site between the suhunits of the viral proteins on the surface of the virus. 



Zn 2+ and F e 2+ 28 - 32 can also be coordinated by nitrogens, 
which are presented by the molegos of both the dimetallic 
phosphatases, the dioxygenases and related VOCs. In sites 
where Zn 2+ plays a structural role, it is typically coordinated 
by cysteine and histidine residues.' 3 ' 3 However, no cysteine 
ligands are present in the active sites of the metalloenzymes 
of this study, and Zn is predominantly bound by carbonyl 
oxygens and histidine in these examples. The dimetallic 
phosphatase and dioxygenase boxes that are specific for Fe 2 ' 
have conserved histidines in the binding positions, reflecting 
this ion's affinity for imadazole nitrogen. 

Except for the DNase 1 family, the binding distances 
between the metal ion and the ligands are the same to 
within 0.1 A in the crystal structures from each family. 
These results suggest that the basic architecture of the 
metal sites can be adapted to function by discrete sequence 
alterations that dictate metal ion specificity. In the dime- 
tallic phosphatases, again regardless of metal ion bound, 
there is also a shared ligand between the two metals that 
are asymetrically bound. This shields the metals from one 
another and allows two metal ions to occupy about the 
same space that the single ones do in the other sites. 



The metal ion in the two DNase 1 superfamily proteins 
available for analysis is more loosely bound to the residues in 
the active site than is the case for the other two families. This 
is consistent with a previous analysis for Mg 2 * binding, 
which indicated that this metal can accept up to only three 
negatively charged ligands, and fewer depending on the 
solvent accessibility of the binding site. 34 The only structure 
of these proteins that is consistent with the distances in the 
other two families is that containing Pb 2+ , an element that 
does not support catalysis by APE1. Note that both the 
synaptojanin and APE1 structures have substrate bound, 
and the metal ion has ligands to the substrate in both cases 
(Table VII, supplementary information). MD simulations 
have suggested that the position of the metal ion differs in 
the free enzyme before and after cleavage of the substrate 
(Oezguen et al., forthcoming). 

DISCUSSION 

This report shows that the PCPMer program can be used 
to analyze similar elements in the architecture of several 
families of metal-binding proteins, and distinguish homo- 
logues. The PCP-motifs defined for three distinct types of 
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TABLE VI. Residues in the Metal-Binding Sites of the Proteins in This Study, as a Function of Metal Ion 



Protein (PDB file name) 



HuAPEl (PDB structures for this enzyme 


Mn, Sm, Pb" 


PDB file: 




M >E9 


1E9N 


1BDC 


bound to 3 different metal ions) 













— 












Pbl Pb2 








ASP70 


OD1 


4.13A 




2.52A 






GLU96 


OE1 




1.99A 


2.75A 






GLU96 


OE2 


2.42A 




2.32A 






ASP210 


0D2 


5.38A 


2.79A 


6.8A 






ASN212 


ND2 




2.98A 








ASP308 


OD2 


3.18A 




4.4A 


bynaptojamn ( 1I9Z) 


Oa 


AolNDOO 














GLU597 


OF9 


9 m 










ASP838 


nrw 








N5P(1USH) 


2Zn 





















9n«A 










HIS43 


Ml? 


nk 














9 q A 
2 


9 91 A 1 ' 








ASN116 


nni 




9niA 








GLN254 














HIS217 


NE2 




2 09A 








HIS252 










Mrell nuclease (1117) 










MN404 
























J~ 2 
















9 9fiA 


2 4lA b 








• / ' 


nni 




2.17A 








HIS 173 


KF9 












HIS206 


ND1 




248A 








HIS208 










Pig acid phosphatase (1UTE) 










FE2 












5TTA 










A9P 9 


nn9 


997A 










TYE55 


nu 


A 












NE9 


9 'wA 












nni 




2 24A 
~ " ■ 








H1S186 






237A 








HIS221 


NT)1 






er pro m p osp a ase 








MN1 


1 r ' 










ASP20 


0D2 


239A 
















2 3lA b 








AQPQ 


nri9 


9 99A 














2 10A 














2 18A 








HIS186 










Catechol 2,3-dioxygenase (1MPY) 


Fe 




















— . 










HIS214 


NE2 


250A 










GLU265 


OE1 


2.29A 






4-hydroxyp eny pyruva ioxygenase( 


e 
















HIS161 


NE2 


218A 










HIS240 


NE2 


208A 










GLU322 


OE1 


1.96A 






2,3-dihydroxybiphenyl 1,2-dioxygenase (1HAN) 


Fe 






FEl o 


FE2 C 








HIS146 














HIS189 


NE2 




2.44A C 








HIS210 


NE2 


2.25A 










GLU260 


OE1 


1.96A 






Human glyoxylase (1QEP) 


Zn 






ZN 










GLN33 


OE1 


2.03A 










GLU99 


OE1 


2.0lA 










HIS126 


NE2 


2.03A 










GLU172 


OE1 


1.99A 






a APEl has highest activity with Mg, but the 3 crystal 


structures have different metal i 


ons. There 


are two Pb ic 


ns in the active site of IE 


!9N. 



it the surface and probably has no effect on the active site. 
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metalloenzymes could be used in MOTIFMINER to iden- 
tify the proteins in the initial alignment, as well as 
homologues with related functions. This indicates the 
PCPMer approach can aid in defining the function of 
proteins in genomic databases, when combined with other 
tools for identifying sequence similarity. 

Identifying Molego Architecture and Using 
PCPMer to Define Function 

Assigning function to genomic sequences is a challeng- 
ing problem that requires new approaches. 36 " 3 ' As 
blocks of homology become smaller, it becomes progres- 
sively more difficult to distinguish meaningful matches 
from random ones. 38 40 As these results show, when 
given a structure-based alignment of the sequences of 
homologous proteins, PCPMer can be used to detect 
similar metal ion binding proteins. For example, PCP- 
Mer scored a Ser/Thr protein phosphatase (1AUI) very 
highly as a dimetallic phosphatase even though it was 
not included in the initial alignment used to identify 
PCP-motifs in this family, and also listed several poly- 
merases, which have two metals in their active centers, 
as being related to this family (Table II). The sensitivity 
of the approach is not limited to metal-containing 
proteins. When molegos were defined for the IL-1B 
family (Table V), the receptors for progesterone and 
TGF-P were identified, as well as a viral chemokine- 
binding protein. Our word-based approach can pinpoint 
underlying structural and functional similarities in 
proteins, regardless of the distance (and order) in the 
sequence between conserved elements. Thus, it is a 
particularly potent tool to identify areas of low but 
significant similarity in homologous proteins with over- 
all low identity. As we previously showed, this is a very 
difficult task for other methods for determining se- 
quence similarity. 4 

While segments of short sequence identity alone may 
indicate common structure, 41 it is generally accepted that 
both sequence and structural similarity is needed to 
establish functional homology. 42 For example, the DALI 
program, 13 which couples similar structure with sequence, 
is able to generate more meaningful alignments than 
programs, such as CLUSTALW, 43 which rely on sequence 
alone. Hence, we limited this study to protein families 
where several crystal structure representatives were 
known. These examples show that viewing individual 
elements as building blocks simplifies the analysis of 
residues that mediate specific metal binding. Our results 
indicate that PCPMer can be used to generate testable 
hypotheses about the function of novel proteins identified 
by genomic sequencing that are unclassified by conven- 
tional sequence analysis approaches. 

The decomposition analysis of these proteins is particu- 
larly valuable when used to relate variations between 
proteins to substrate specificity and catalysis. Thus, it can 
play a useful role in protein design. 23 ' 24 ' 44 



Metal Ion Specificity Despite Similarities in the 
Metal-Binding Mechanisms 

One important result of our analysis was that while the 
molego structure, i.e., the protein architectural elements, 
of the active site within a family was relatively invariant, 
discrete changes in a few key residues may dictate the 
metal ion specificity for catalysis. As Tables VI and VII 
(supplementary information) indicate, the conserved amino 
acid positions alter with the metal type, within the limita- 
tions of the site geometry (the actual occupancy will of 
course also be affected by the relative concentrations of the 
metals in the biological environment). All three metal- 
binding sites have a key carboxylate linkage, and then 
other ligands that vary with the preferentially bound 
metal ion. The exact pattern of variation will require more 
comparisons of sequences from enzymes in the families. 
This result suggests that once a metal ion-binding site is 
defined, simple residue changes at defined positions can be 
made to alter its metal ion specificity. 

The present speed of the PCPMer program is now 
sufficient to use it to scan for functional homologues in 
larger, genomic sequence databases (Bin Zhou et al., 
forthcoming). We are also testing more automatic ap- 
proaches to PCP-motif generation, using for example a 
molego library assembled from existing data, such as that 
in PFAM. 45 ' 46 Structural comparisons of the individual 
elements from many proteins, as described here, should 
establish a basic protein dictionary of the amino acid 
words that make up complex proteins. 

CONCLUSIONS 

1. The metal containing active sites of three distinct 
enzyme groups, DNase 1 homologues, dimetallic phos- 
phatases, and dioxygenases, can be decomposed into 
molegos, areas of conserved sequence and structure. 
The dimensions of the site and orientation of the 
molegos to the metal ions vary little across the superfam- 
ily members, even in homologues that have quite 
different overall activity. 

2. The PCPMer program can be used to mine sequence 
databases and identify proteins with functional and 
structural similarities to a given protein family. 

3. The residues in the binding site created by the molegos 
dictate the specificity for the type of metal ion bound by 
the metalloenzyme. The specific residue interactions 
with the metal ion observed in the enzymes in this 
study are consistent with rules established by previous 
biophysical studies of metal ion binding affinities. 
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